Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Cross-modal chiastopic-fusion attention network for visual question answering

Mao WANG, Yaxiong PENG, Anjiang LU

Journal of Computer Applications 2022, 42 (3): 854-859. DOI: 10.11772/j.issn.1001-9081.2021030470

Abstract （269）

HTML （8）

PDF （759KB）（82）

Save

In order to improve the accuracy of Visual Question Answering （VQA） model in answering complex image questions， a Cross-modal Chiastopic-fusion Attention Network （CCAN） for VQA was proposed. Firstly， an improved residual channel self-attention method was proposed to pay attention to the image， and to find important areas according to overall information of the image， thereby introduced a new joint attention mechanism that combined word attention and image area attention； secondly， a “cross-modal chiastopic-fusion” network was proposed to generate multiple features to integrate the two dynamic information flows together， and an effective attention flow was generated in each modal. Among them， element-wise multiplication method was used for joint features. In addition， in order to avoid an increase in computational cost， parameters were shared between networks. Experimental results on VQA v1.0 dataset show that the accuracy of the proposed model reaches 67.57%， which is 2.97 percentage points higher than that of MLAN （Multi-level Attention Network） model， 1.20 percentage points higher than that of CAQT （Co-Attention network with Question Type） model. The proposed method effectively improves the accuracy of visual question answering model. The effectiveness and robustness of the method are verified.

Table and Figures | Reference | Related Articles | Metrics